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ABSTRACT 


In  this  paper,  we  present  a  recommended  quantitative  approach  for  analyzing  the  concept  of  isoperformance. 
The  ideas  outlined  here  rely  upon  the  Bayesian  version  of  model  evaluation.  We  define  models  as  hypotheses 
about  the  probabilities  of  subjects  being  categorized  by  a  combination  of  predictor  variables  and  criterion  variables. 
From  this  foundation,  a  computational  formula  is  derived  whose  value  can  be  compared  to  a  x2  distribution.  For 
example,  we  are  often  interested  in  calculating  the  probability  of  a  subject  failing  during  some  phase  of  flight 
training  given  that  we  have  information  on  certain  predictor  variables.  We  would  like  to  ascertain  whether  the 
extra  information  contained  in  such  predictor  variables  is  useful.  If  it  is  useful,  then  it  enables  us  to  predict  the 
probability  of  failure  for  any  given  student.  This  ability  to  predict  a  change  in  the  probability  of  failure,  either  in 
the  upwards  or  downwards  direction,  is  very  helpful  to  managers  and  decision  makers  in  the  training  community. 
In  addition,  these  techniques  can  help  answer  the  question  of  whether  a  candidate  for  flight  training  can  “trade-off1 ’ 
a  high  score  on  one  predictor  variable  for  a  low  score  on  a  different  predictor  variable.  In  particular,  we  would  like 
to  investigate  the  possibility  of  trading  off  different  classes  of  predictor  variables,  say  cognitive  information 
processing  variables  and  personality  variables,  and  still  achieve  the  same  level  of  performance.  The  maximum 
entropy  principle  is  used  as  a  systematic  disciplined  approach  to  find  parsimonious  models. 
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INTRODUCTION 


In  this  paper,  we  present  a  recommended  quantitative  approach  for  analyzing  the  concept  of  isoperformance. 

An  article  in  the  journal  Human  Factors  by  Jones  and  Kennedy  [1]  prompted  our  current  interest  in  applying 
isoperformance  to  selection  and  training  issues. 

The  ideas  outlined  here  rely  upon  the  Bayesian  version  of  model  evaluation.  We  define  models  as  hypotheses 
about  the  probabilities  of  subjects  being  categorized  by  a  combination  of  predictor  variables  and  criterion  variables. 
From  this  foundation,  a  computational  formula  is  derived  whose  value  can  be  compared  to  a  x2  distribution. 

If  this  easily  computed  value  falls  into  the  upper  5%  region  of  a  x2  distribution  with  the  appropriate  degrees  of 
freedom,  then  we  reject  the  tentative  model.  On  the  other  hand,  if  the  value  falls  into  the  lower  95%  region  of  the 
distribution,  then  we  accept  the  model.  Once  a  model  is  found  that  can  be  accepted,  a  few  elementary  rules  from 
probability  theory  can  be  used  to  calculate  the  probability  of  events  involved  in  isoperformance  curves. 

For  example,  we  are  often  interested  in  calculating  the  probability  of  a  subject  failing  during  some  phase  of 
flight  training  given  that  we  have  information  on  certain  predictor  variables.  Alternatively,  one  can  focus  on  the 
positive  and  say  that  we  are  interested  in  the  probability  of  a  subject  passing  flight  training.  We  would  like  to 
ascertain  whether  the  extra  information  contained  in  such  predictor  variables  is  useful.  If  it  is  useful,  then  it 
enables  us  to  predict  the  probability  of  failure  (or  passing)  for  any  given  student.  This  ability  to  predict  a  change 
in  the  probability  of  failure,  either  in  the  upwards  or  downwards  direction,  is  very  helpful  to  managers  and 
decision  makers  in  the  training  community. 

In  addition,  these  techniques  can  help  answer  the  question  of  whether  a  candidate  for  flight  training  can 
“trade-off’  a  high  score  on  one  predictor  variable  for  a  low  score  on  a  different  predictor  variable.  In  particular,  we 
would  like  to  investigate  the  possibility  of  trading  off  different  classes  of  predictor  variables,  say  cognitive 
information  processing  variables  and  personality  variables,  and  still  achieve  the  same  level  of  performance. 

THE  DATA  BASE 

The  purpose  of  this  paper  is  to  provide  the  general  quantitative  foundations  for  analyzing  isoperformance. 

From  time  to  time,  we  shall  employ  fictitious  data  to  illustrate  the  formulas.  The  analysis  of  actual  data  using  these 
techniques  will  be  presented  in  a  subsequent  report  [2].  The  fictitious  data  does,  nonetheless,  give  some  general 
idea  of  the  actual  data  base  we  will  be  analyzing  in  the  future  for  the  isoperformance  project. 

As  part  of  another  project  called  the  Pilot  Prediction  System  (PPS),  we  have  constructed  a  rather  large  and 
comprehensive  data  base  consisting  of  various  selection  and  training  variables.  A  subset  of  this  data  base  contains 
information  on  over  a  thousand  Navy  and  Marine  Corps  candidates  who  entered  pilot  flight  training  from  1993  to 
early  1998. 

Scores  on  the  various  subtests  of  the  Aviation  Selection  Test  Battery  (ASTB)  and  all  the  grades  from  the 
academic  ground  school  (API  -  Aviation  Preflight  Indoctrination)  portion  of  training  prior  to  actual  flight  training 
are  part  of  this  data  base.  We  will  concentrate  on  one  of  the  subtests  from  the  ASTB,  the  Pilot  Biographical 
Inventory  (PBI),  and  the  final  overall  grade  from  API  called  the  Navy  Standard  Score  (NSS). 

The  raw  score  on  the  PBI  is  transformed  into  one  of  nine  discrete  categories  so  that  PBI  =  1,  2  *  •  •  9  with  1 
being  the  lowest  score  and  9,  the  highest  score.  There  are  no  candidates  in  the  data  base  with  a  PBI  =  1  so  PBI 
will  consist  of  eight  categories.  The  API  NSS  is  tranformed  into  one  of  six  discrete  categories,  API  =  1,  2  •  •  •  6 
with,  again,  1  representing  low  scores  and  6  representing  high  scores  from  ground  school.  Thus,  PBI  and  API 
represent  the  two  predictor  variables. 

One  criterion  variable  will  be  used  in  the  subsequent  analysis.  This  criterion  variable  simply  records  whether  a 
candidate  failed  some  later  phase  of  flight  training  after  API.  The  crux  of  the  analysis  then  centers  naturally  upon 
the  probability  of  failure  given  information  about  two  predictor  variables. 
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CONTINGENCY  TABLES 


As  just  mentioned,  we  will  eventually  analyze  data  from  the  PPS  data  base  in  our  first  assessment  of 
isoperformance  curves.  The  following  schema  will  be  used  to  set  up  the  statistical  derivations  as  detailed  in  later 
sections.  Consider  n  cells  that  represent  the  n  different  ways  that  an  event  could  happen.  For  us,  these  n  cells 
represent  the  various  combinations  of  categories  for  a  given  number  of  predictor  variables  and  criterion  variables. 

For  example,  a  subject  in  the  data  base  is  classified  into  one  of  eight  categories  on  the  PBI  predictor  variable, 
one  of  six  categories  on  the  API  final  grade  predictor  variable,  and  one  of  two  categories  to  indicate  success  or 
failure  in  some  phase  of  flight  training.  The  total  number  of  ways  that  a  subject  could  be  categorized  given  these 
three  variables  is  n.  Therefore,  n  —  8x6x2  =  96  different  cells.  The  first  cell  would  contain  all  those  subjects 
with  scores,  PBI  =  2,  API  =  1,  ATTRTTE  =  0;  the  second  cell  all  those  subjects  with  scores  PBI  =  2,  API  =  2, 
ATTRITE  =  0;  the  jth  cell  all  those  subjects  with  scores  PBI  =  6,  API  =  4,  ATTRITE  =  1;  and  the  96th  and  last 
cell  all  those  subjects  with  scores  PBI  =  9,  API  =  6,  ATTRITE  =  1.  These  n  =  96  cells  can  be  arranged  in  any 
way  that  is  convenient. 

One  traditional  and  convenient  way  of  arranging  these  n  cells  is  a  two-dimensional  table  of  rows  and  columns. 
In  this  arrangement,  the  n  cells  are  called  a  cross-tabulation  or  contingency  table.  Using  our  previous  example,  the 
n  =  96  cells  could  be  displayed  as  two  contingency  tables  each  with  eight  rows  for  the  eight  categories  of  the  PBI 
and  six  columns  for  the  six  categories  of  the  API  final  grade.  The  first  contingency  table  consists  of  all  those 
subjects  who  failed  some  phase  of  flight  training  (ATTRITE  =  0)  while  the  second  consists  of  all  those  subjects 
who  passed  all  phases  of  flight  training  (ATTRITE  =  1). 

The  symbol  N  will  be  used  to  indicate  the  total  number  of  subjects  allocated  to  the  n  cells.  The  number  of 
subjects  in  the  zth  cell  will  be  labeled  Nim  Therefore, 

i= 1 

Attached  to  each  cell  is  a  parameter,  Q*,  that  represents  the  probability  for  a  subject  to  fall  into  the  ith  cell.  The 
whole  purpose  of  analyzing  the  contingency  tables  is  to  find  values  for  the  Q%  that  are  a  good  fit  to  the  empirical 
frequency  data  in  the  PPS  data  base.  Each  separate  consideration  of  a  set  of  potential  Q%  will  be  called  a  model 
and  given  the  notation  Ma,  Mb,  Me  *  *  •• 

See  Fig.  1  for  a  sketch  of  the  salient  points  made  in  the  above  discussion  Two  8x6  contingency  tables  are 
shown.  The  table  on  the  left  consists  of  all  the  subjects  in  the  data  base  who  failed  some  phase  of  flight  training, 
while  the  table  on  the  right  consists  of  those  subjects  who  passed  all  phases  of  flight  training.  Each  cell  is 
numbered,  starting  with  cell  1  and  ending  with  cell  96.  The  actual  number  of  subjects  falling  into  cell  29  is  N2g. 
The  probability  for  a  subject  to  be  categorized  into  cell  16  is  Qi&.  The  jth  cell  consists  of  the  intersection 
ATTRITE  =  1,  PBI  =  6,  and  API  =  4.  There  are  a  total  of 

n 

N-  -  1,120 

i= 1 

subjects  in  the  data  base  who  can  be  placed  into  one,  and  only  one,  of  these  96  cells. 

The  models,  Ma,  Mb,  Me  *  *  •,  will  embody  various  hypotheses  regarding  isoperfonnance  curves.  Such 
interesting  hypotheses  will  concern  independence  or,  the  lack  thereof,  among  the  various  Q%.  Other  hypotheses  to 
be  investigated  concern  an  increase  of  the  Qi  with  an  increase  in  a  variable  score,  and  most  especially,  “tradeoffs” 
among  certain  of  the  Qi.  Addressing  such  hypotheses  will  allow  us  to  accept  or  reject  the  idea  that  subjects  can 
achieve  equal  probability  of  success  in  flight  training  by  trading  off  high  scores  on  one  predictor  variable  with  low 
scores  on  another  predictor  variable.  We  will  always  be  guided  by  the  principle  of  scientific  parsimony,  sometimes 
called  “Occam’s  razor,”  to  seek  the  simplest  models  that  fit  the  data. 
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Figure  1:  A  sketch  of  a  convenient  arrangement  of  n  =  96  cells  into  two  8x6  contingency  tables. 


THE  BAYESIAN  FORMALISM  FOR  MODEL  EVALUATION 


We  first  write  Bayes’s  Formula  for  the  posterior  probability  for  arty  given  model,  say  model  Ma-  Then 


P(Ma\D,I)  = 


P(D\MA:1)  P(Ma\1) 
P(D\X) 


(1) 


where  D  stands  for  the  observed  frequency  data  and  1  stands  for  all  the  background  assumptions.  The  posterior 
probability  for  model  MA  as  conditioned  on  the  truth  of  D  and  1  is  given  on  the  left-hand  side  of  Equation  (1). 
The  right  hand  side  consists  of  the  likelihood  of  the  data  conditioned  on  the  truth  of  model  Ma  times  the  prior 
probability  of  model  MA .  The  likelihood  times  prior  component  in  the  numerator  is  divided  by  the  probability  of 
the  data.  The  denominator  is  the  sum  of  all  the  terms  that  could  appear  in  the  numerator  and  thus  is  a  sum  over  all 
possible  models.  In  the  future,  we  shall  drop  reference  to  the  background  assumptions,  j,  to  shorten  the  equations. 


The  Bayesian  approach  actually  compares  the  ratio  of  posterior  probabilities  for  any  two  models,  say  model 
Ma  and  Mb-  This  allows  us  to  remove  the  complicated  sum,  P(D),  from  further  consideration: 


P(Ma\D)  _  P(D\Ma)  P(Ma ) 
P{Mb\D)  P{D\Mb)  P{MbY 


Another  assumption  is  usually  introduced  at  this  point.  The  prior  probability  of  all  models  is  considered  to  be 
equal.  No  favor  or  bias  is  shown  for  a  model  when  compared  with  any  other  model.  Under  this  assumption,  the 
ratio  of  posterior  probabilities  for  any  two  models  reduces  to  the  ratio  of  their  respective  likelihoods  under  each 
given  model: 

P{Ma\D)  =  P(D\Ma) 

P(Mb\D)  P(D\Mb)  1 


Now  the  question  is,  “How  do  we  find  the  likelihood  of  the  data  given  a  particular  model?”  To  answer  this 
question,  we  again  invoke  Bayes’s  Formula,  but  this  time  at  a  lower  level.  We  now  write  down  Bayes’s  Formula 
for  the  posterior  probability  for  arty  given  contingency  table  based  on  the  data  and  a  given  model.  The  notation  Fj 
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is  used  for  the  jth  contingency  table: 


P(Fj\D,MA)  = 


PjDjM^Fj)  P[Fj\Ma) 
P{D\Ma) 


(4) 


The  structure  of  Bayes’s  Formula  is  the  same  as  that  in  Equation  (1),  but  it  is  now  expressed  as  the  posterior 
probability  for  the  jth  contingency  table  as  conditioned  on  the  assumed  truth  of  a  given  model.  Observe  that  the 
denominator  in  Equation  (4)  is  the  very  expression  needed  to  solve  Equation  (3). 

Since  the  term  P(D\MA)  in  the  denominator  of  Equation  (4)  is  the  sum  of  all  the  terms  that  could  appear  in 
the  numerator,  it  is  written  explicitly  as 


K 

P(D\MA)  =  J2p(D\MAiFi)P(Fi\MA).  -  (5) 

i—1 

This  is  a  sum  over  all  K  possible  contingency  tables  that  could  arise  from  considering  N  subjects  allocated  to  n 
cells.  Equation  (5)  is  also  an  axiom  from  probability  theory  and  is  given  the  name  marginalization. 


The  final  step  among  these  strictly  Bayesian  manipulations  is  to  determine  P(D\Ma )  and  P(D\MB).  This 
turns  out  to  be  a  relatively  simple  problem  because  we  are  dealing  with  noise-free  data.  We  assume  that  we  have 
been  careful  enough  to  correctly  record  the  various  categorical  variables  so  that  we  do  not  have  to  account  for  any 
attached  error  in  the  frequency  counts  for  these  variables.  The  likelihood  for  the  jth  contingency  table  is  therefore 
equal  to  1  when  the  frequency  data  match  the  numbers  in  the  contingency  table  and  0  for  any  contingency  table 
where  the  data  do  not  match  the  numbers  in  the  table.  Symbolically,  this  means 


P(Fj  \D,  Ma) 


1  x  P{Fj\MA) _ _ 

[  1  x  P{Fj\Ma)  }  +  i  Eti1 0  x  P(Fi\MA) } ' 


(6) 


The  denominator  in  Equation  (6),  the  term  we  are  seeking,  therefore  simplifies  tremendously,  reducing  to 


P(D\Ma)  =  P(Fj\MA). 


(7) 


Likewise, 


P(D\MB)  =  P(Fj\MB).  (8) 

This  completes  the  section  on  the  Bayesian  manipulations.  The  next  section  continues  the  derivation  through  to  the 
point  where  we  can  write  computer  programs  to  analyze  actual  data. 


FORMULA  FOR  COMPUTING  THE  ACCEPTANCE  OR  REJECTION  OF  ANY  GIVEN  MODEL 


As  the  derivation  for  the  actual  formula  used  to  compute  whether  to  accept  or  reject  a  model  is  rather  long  and 
involved,  we  relegate  the  mathematical  derivation  to  the  Appendix.  The  interested  reader  may  go  there  for  all  the 
details.  Only  the  final  formula  is  presented  here  as  Equation  (9). 


2iV^/iln 

i—1 


x2  (ydf). 


(9) 


As  mentioned  before,  N  is  the  total  number  of  subjects  allocated  to  the  contingency  tables.  The  observed  relative 
frequency  for  the  ith  cell  is  given  the  notation  fa  and  is  equal  to  Ni/N.  Qi  refers  to  any  model  for  assigning  the 
probabilities  that  we  might  propose  to  test.  A  superscript  will  be  attached  to  the  Qi  to  identify  which  model  is 
being  discussed  so  that  Qf  will  mean  the  probabilities  for  each  of  the  n  cells  under  Model  A.  Likewise,  Qf  will 
mean  the  probabilities  for  each  of  the  n  cells  under  Model  B,  and  so  oa  Equation  (9)  says  that  the  number 
computed  on  the  left-hand  side,  which  must  be  positive,  will  be  distributed  according  to  the  x2  distribution  with  v 
degrees  of  freedom  We  adopt  the  usual  convention  of  rejecting  any  proposed  model  for  the  Qi  if  the  value 
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computed  on  the  left-hand  side  falls  into  the  upper  5%  region  of  the  x2  distribution  with  the  appropriate  degrees  of 
freedom. 

Numerical  Examples 

In  this  section,  we  present  some  simple  numerical  examples  to  illustrate  the  use  of  Equation  (9).  Consider  the 
situation  of  n  —  8  cells,  conveniently  arranged  into  two  2x2  contingency  tables.  The  first  table  consists  of  those 
subjects  who  failed  some  phase  of  flight  training,  while  the  second  table  consists  of  those  who  passed  all  stages  of 
flight  training.  The  total  number  of  subjects  in  the  data  base  is  N  =  100.  Figure  2  shows  these  two  tables  with  the 
actual  frequencies,  iV*,  filled  in  for  all  eight  cells. 
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Figure  2:  Two  contingency  tables  showing  fictitious  data  for  100  subjects.  Each  table  shows  the  two  predictor 
variables  labeled  PV1  and  PV 2  broken  down  into  high  and  low  scores.  The  table  on  the  left  shows  the  subjects 
who  failed  some  phase  of  flight  training  while  that  on  the  right  shows  those  who  passed  all  phases  of  flight  training. 

There  are  two  predictor  variables,  PV  1  and  PV 2,  with  two  levels  for  each  predictor  variable  called  “Low” 
and  “High.”  The  left-hand  side  of  Equation  (9)  says  to  compute 

22Yy>ln(A)  =  (2  x  100)  x 


What  is  model  MA  so  that  we  can  substitute  values  for  the  Qfl  It  is  up  to  us  to  choose  whatever  hypothesis  we 
are  interested  in  investigating.  For  starters,  let’s  pick  the  simplest  hypothesis  we  can  think  of,  that  is,  that  all  eight 
Qi  are  equal.  Now  we  can  fill  in  the  values  for  Qf  based  on  this  hypothesis: 

2nPHw 

09  in  GS) + -i5in  (iS) + ""i7in  (tS). 

=  200  x  (-  .02957  +  .027348  +  •  •  •  +  .052272) 

=  200  x. 02784 
=  5.57. 

This  value  of  5.57  is  compared  to  a  x2  distribution  with  v  =  7  df.  The  critical  value  that  cuts  off  the  upper  5%  of 
this  x2  distribution  is  14.07.  Therefore,  5.57  fits  comfortably  within  this  distribution  and  does  not  fall  into  the 


(2  x  100)  x 
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rejection  regioa  We  cannot  reject  model  Ma  that  says  that  the  probability  for  a  subject  to  fall  into  arty  of  the 
eight  categories  is  the  same. 

The  probability  of  failure  is  Qi  +  Q2  4-  Qs  T-  Qa  —  -50,  which  is  the  same  as  the  probability  of  passing 
Qs  4-  Qg  4-  Q7  4-  Qs  =  50.  There  is  no  effect  due  to  the  predictor  variables  either.  There  is  the  same  probability 
of  .25  of  being  categorized  in  the  low  or  high  level  of  either  predictor  variables  no  matter  whether  you  pass  or  fail. 

Now  consider  a  second  example  as  shown  in  Fig.  3.  N  remains  at  100  subjects.  The  value  computed  by 
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Figure  3:  Two  different  contingency  tables  showing  fictitious  data  for  100  subjects.  Each  table  shows  the  two 
predictor  variables  labeled  PV 1  and  PV 2  broken  down  into  high  and  low  scores.  The  table  on  the  left  shows  the 
subjects  who  failed  some  phase  of  flight  training  while  that  on  the  right  shows  those  who  passed  all  phases  of  flight 
training. 


Equation  (9)  for  this  second  example  is  34.62,  which  falls  into  the  upper  5%  region  of  the  x2  distribution  with 
v  =  7  df.  Therefore,  for  these  data,  we  must  reject  model  Ma  that  says  all  eight  Qi  =  .125. 

What  alternative  model  might  fit  the  data  better  than  model  Ma?  A  casual  inspection  of  Fig.  3  will  reveal  that 
the  number  of  attritions  is  much  less  than  the  number  of  graduates.  Let  model  M b  posit  that  the  ratio  of  passing 
to  failing  is  4: 1  so  that 

Qpass  —  Q5  4-  Qs  4-  Q7  4-  Qs  —  .80 


and 


Qpail  —  Q 1  4-  Q2  +  Qs  +  Qa  —  -20. 


Otherwise,  there  are  no  further  constraints  on  the  Qi.  Within  the  pass  and  fail  groups  we  want  the  Qi  to  be  evenly 
spread  out.  This  foreshadows  the  idea  of  maximum  entropy  to  be  introduced  later  in  the  report.  The  specification 
of  model  Mb  is  shown  below  in  Table  1,  along  with  the  previous  Ma  and  a  new  model,  Me,  to  be  discussed 
shortly. 


The  value  computed  by  Equation  (9)  for  model  Mb  is  1.62.  The  degrees  of  freedom  must  be  adjusted 
downwards  by  1  since  we  have  introduced  a  new  constraint.  The  critical  value  of  the  x2  distribution  for  v  =  6  df 
is  12.59,  so  we  are  well  within  the  region  where  we  would  accept  model  Mb-  The  data  do  not  allow  us  to  reject 
the  hypothesis  that  P(Pass)  =  .80  and  P(Fail)  =  .20.  However,  by  accepting  model  Mb,  we  still  do  not  see  any 
effects  due  to  either  of  the  predictor  variables. 

For  a  third  and  final  numerical  example,  extending  the  insights  from  the  first  two  examples,  please  refer  to 
Fig  4.  For  these  data,  model  Ma  has  a  value  of  145.70,  so  it  is  clearly  rejected.  The  revised  thinking  incorporated 
into  model  Mb  is  not  much  better  at  115.47  and  it  too  must  be  rejected. 

We  have  to  search  for  another  plausible  model  that  fits  these  new  data.  We  will  retain  the  hypothesis  that 
Qpass  =  -80  and  QfclU  =  .20  from  model  Mb-  Within  each  of  the  two  groups  there  appears  to  be  a  strong  effect 
due  to  the  predictor  variables,  PV  1  and  PV 2.  If  we  now  attribute  a  strong  theoretical  impact  for  low  PV  1  and 
PV 2  scores  to  predict  failure  and  high  PV  1  and  PV 2  scores  to  predict  success,  then  a  model  like  model  Me  as 
shown  in  Table  1  might  work. 
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Table  1:  The  specification  of  the  eight  Qi  values  for  models  Ma ,  Mb,  and  Me . 
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Figure  4:  The  final  two  contingency  tables  showing  fictitious  data  for  100  subjects.  Each  table  shows  the  two 
predictor  variables  labeled  PVI  and  PV 2  broken  down  into  high  and  low  scores.  The  table  on  the  left  shows  the 
subjects  who  failed  some  phase  of  flight  training  while  that  on  the  right  shows  those  who  passed  all  phases  of  flight 
training. 


We  examine  in  detail  how  all  of  the  constraints  are  satisfied  by  this  model.  First  of  all,  Qi  must  equal  1. 
The  second  constraint  is  that  Q1  +  Q2  +  Q3  +  Qi  —  -20  and  Q5  +  Qq  +  Q7  +  Qs  =  -80.  Thirdly,  low  PV  1  and 
PV 2  scores  are  equal  for  the  fail  group,  Qi  +  Qs  =  Qi  +  Q2,  and  high  PV  1  and  PV 2  scores  are  equal  for  the 
pass  group,  Q6  +  Qs  =  Qi  +  Q&-  Notice  that  the  rule  for  keeping  as  many  Qi  equal  as  possible  is  followed  and 
that  the  ratio  of  4:1  is  followed  as  well  as  we  move  from  the  fail  group  to  the  pass  group. 

Equation  (9)  produces  a  value  of  6.84  for  model  Me-  The  degrees  of  freedom  must  be  reduced  by  one  again 
to  account  for  the  added  constraint.  The  critical  value  demarcating  the  95%  and  5%  regions  of  the  x2  for  u  =  5  df 
equals  11.07.  This  is  a  model  we  can  accept.  Table  2  summarizes  the  three  models  examined  and  their  status  for 
the  data  as  given  in  Fig.  4. 


Table  2:  Summary  of  the  three  models  examined  for  the  data  in  Fig.  4. 


Model 

x2 

df 

Status 

Ma 

145.70 

7 

Rejected 

Mb 

115.40 

6 

Rejected 

Me 

6.84 

5 

Accepted 

7 


CALCULATING  THE  PROBABILITY  OF  EVENTS 

Once  we  have  found  the  most  conservative  model  that  can  be  accepted,  it  is  a  relatively  simple  matter  to  find 
the  probability  for  any  event  of  interest.  For  example,  it  is  usually  of  interest  to  calculate  the  probability  for 
attrition  as  a  function  of  the  predictor  variables.  A  particular  case  can  be  expressed  symbolically  as 


P (Fail | P VI  =  low  and  PV 2  -  high) 


which  is  read  as  the  probability  of  failing  given  that  a  subject  scored  low  on  predictor  variable  one  and  scored  high 
on  predictor  variable  two. 


Probability  theory  provides  a  well-known  solution  for  this  situation.  Abstractly,  the  probability  of  event  A 
conditioned  on  the  truth  of  event  B  is  written 


P(AnB) 

p <a'b> = “W 

where  P ( A  n  B)  refers  to  the  joint  occurrence  of  events  A  and  B.  If  the  event  A  can  be  broken  down  into  K 
mutually  exclusive  and  exhaustive  events,  then  the  probability  of  the  jth  category  of  A  is  written  as1 


P{Aj\B)  = 


PjAjSXB) 

Y*=i  P(Ai  n  B) 


(11) 


In  the  case  that  concerns  us,  A  is  the  event  of  success  in  flight  training  and  it  is  broken  down  into  just  K  =  2 
categories,  Pass  or  Fail.  These  two  categories  are  mutually  exclusive  and  exhaustive.  That  is  to  say,  a  given 
subject  must  be  in  one  of  these  two  categories  and  given  that  he  or  she  is  in  one  of  the  two  categories,  she  or  he 
cannot  be  in  the  other  category.  The  conditioning  information  B  is  the  score  on  the  predictor  variable. 


Equation  (11)  can  now  be  rewritten  simply  as 

P(Ai\B)  = 


P(A\  n  B) 


p(A1  n  B)  +  p\a2  n  B) ' 


(12) 


If  we  let  event  Ax  stand  for  fail,  event  A2  for  pass,  and  event  B  for  a  low  score  on  predictor  variable  one,  then 
Equation  (12)  becomes 


P(Fail|Pl/l  =low)  = 


P(Fail  and  PV  1  =  low) 


P(Fail  and  PV1  =  low)  +  P(Pass  and  PV  1  =  low) 

Our  acceptable  model  will  then  provide  us  with  the  probabilities  to  substitute  into  Equation  (13). 

Let  us  return  to  the  first  numerical  example  as  depicted  in  Fig.  2.  The  numerator  in  Equation  (13)  is  the 
intersection  of  the  Fail  cells  with  PV  1  =low,  which  is  Qi  +  Q3  =  -25.  At  this  point,  we  need  find  only  the 
second  term  in  the  denominator.  This  is  the  intersection  of  the  Pass  and  PV  1  —  low  cells,  which  is 
05  +  Q-  =  .25.  Substituting  these  probabilities  into  Equation  (13)  yields 


(13) 


P(Fail|PFl  =  low)  = 


.25 


.25  +  .25 
=  .50. 

However,  this  probability  is  the  same  as  P(Fail)  not  conditioned  on  any  information,  i.e., 

Qi  +  Q2  T  Qz  T  Qi  =  -50. 


(14) 


^his  is  Bayes’s  Formula  written  in  another  way. 
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The  extra  information  contained  in  the  PV 1  score  was  of  no  help  whatsoever.  It  is  irrelevant  information. 

The  same  tactic  just  outlined  can  be  employed  when  conditioning  on  any  number  of  predictor  variables.  Say 
that  we  are  interested  in  the  information  provided  by  the  scores  on  both  PV  1  and  PV 2: 

P(Fail|PFl  =  low  and  PV 2  =  low) 

This  is  equal  to 

P(Fail  and  PV1  ==  low  and  PV 2  =  low) 

P(Fail  and  PV  1  =  low  and  PV 2  =  low)  +  P(Pass  and  PV  1  ==  low  and  PV 2  =  low)  * 

The  cell  that  is  the  intersection  in  the  numerator  is  Qx  —  .125,  and  the  cell  that  is  the  intersection  of  the  second 
term  in  the  denominator  is  Qs  —  .125.  Therefore, 


P(Fail|Pyi  =  low  and  PV 2  =  low) 


.125 

.125 +.125 


=  .50.  (15) 

So,  once  again,  the  information  from  both  predictor  variables  was  completely  irrelevant  or  useless.  Conditioning  on 
this  extra  information  did  not  change  the  probability  of  failing  from  what  we  knew  when  we  did  not  have  this 
information,  that  is,  P(Fail)  =  .50. 


What  about  the  second  numerical  example  as  illustrated  in  Fig.  3?  Does  the  extra  information  from  the 
predictor  variables  help  here?  The  same  formula  applies  so  all  we  have  to  do  is  plug  in  the  correct  values  for  the 
probabilities.  In  the  second  numerical  example,  model  Mb  was  found  to  be  an  acceptable  model  with  the  values 
Qi  =  .05  and  Qs  =  .20 


P  (Fail)  PVT  =  low  and  PV 2  =  low)  - 


.05 

.05  +  .20 


=  .20. 


(16) 


However,  P(Fail)  =  Qi  +  Q2  +  Q3  +  Qa  —  20  as  well.  Here  also  the  predictor  variables  are  providing  no  useful 
information  with  regard  to  the  probability  of  failing. 

In  the  third  example,  we  finally  do  observe  an  influence  on  the  probability  of  failing  by  knowing  the  scores  on 
the  predictor  variables.  Refer  back  to  Table  1  where  the  values  of  Qx  —  -14  and  Q5  =  .08  for  model  Me  are 
listed.  In  this  case, 


P(Fail|PVl  =  low  and  PV 2  =  low) 


.14 

.14+  .08 


=  .64.  (17) 

Knowing  that  a  subject  scored  low  on  both  PVT  and  PV 2  raised  the  probability  of  failing  from  .20  to  .64.  The 
scores  on  these  predictor  variables  are  valuable  information  that  permit  us  to  materially  change  our  assessment  of 
failing. 

HOW  TO  FIND  MODELS  CONSISTING  OF  PRESCRIBED  INFORMATION 

We  have  nearly  completed  the  quantitative  overview  for  the  analysis  that  we  intend  to  cany  out  for  ** 

isopeiformance  curves.  One  item  still  remains  to  be  discussed,  however.  How  does  one  manage  to  assign  values  to 
the  Qi  and  thus  arrive  at  plausible  models? 
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In  the  numerical  examples  that  were  presented  earlier,  this  task  was  not  so  difficult.  The  assigned  values  could 
be  intuited  without  too  much  difficulty.  But  we  really  require  some  disciplined,  systematic  method  for  assigning 
the  Qi  that  doesn’t  depend  upon  someone’s  intuitive  insight.  In  this  final  section,  we  provide  such  a  method  for 
assigning  the  Qi\  a  method  that  has  a  number  of  attractive  features.  The  method  is  called  the  Maximum  Entropy 
Principle  (MEP),  and  it  permits  us  to  systematically  generate  only  the  models  with  the  known  information  that  we 
have  consciously  inserted  and  to  avoid  models  with  hidden  assumptions  about  information  we  do  not  wish  to  insert. 


The  mathematical  derivation  behind  the  MEP  will  not  be  presented  in  this  report.  A  lengthy  and  thorough 
tutorial  on  this  subject  is  available  in  Volume  II  of  my  textbook  [3].  The  treatment  in  my  book  is  based  entirely 
upon  the  seminal  work  on  the  MEP  by  Edwin  T.  Jaynes.  Instead,  we  present  here  only  the  formulas  that  show  how 
one  assigns  values  to  the  Qi. 


Equation  (18)  below  presents  the  simplest  form  of  the  MEP  where  only  one  piece  of  information  has  been 
inserted  into  a  model: 

e\iAi(xi) 

=  i  (18) 


where  Ai  is  a  constant  value  called  a  Lagrange  multiplier.  The  value  for  Ai  can  be  determined  through  numerical 
methods.  We  shall  use  only  a  veiy  simple  trial  and  error  technique  to  find  Ai.  Ai(xi)  is  the  notation  for  a 
constraint  on  the  n  values  of  the  Qi.  As  the  argument  Xi  indicates,  there  exists  a  separate  value  for  each  of  the 
Qi  we  are  trying  to  assign.  The  denominator  in  Equation  (18)  consists  of  the  sum  over  all  n  possible  cells,  that  is, 
the  sum  of  each  possible  term  that  could  appear  in  the  numerator.  These  remarks  about  the  MEP  formula  will  be 
clarified  by  the  numerical  examples  to  follow. 


As  the  first,  and  easiest  example  of  the  MEP,  consider  the  case  where  Ai  =  0.  This  is  the  case  where  we  are 
inserting  no  information  in  the  form  of  a  constraint  about  the  model.  Actually  this  is  not  quite  true.  There  is  one 
piece  of  information  that  is  universally  present  in  the  MEP.  This  is  the  constraint  that  all  n  Qt  must  sum  to  1.  This 
constraint  is  universally  present  because  all  probability  distributions  must  sum  (or  integrate)  to  1.  The  essence  of 
the  MEP  is  that  the  assignment  of  the  Qi  must  have  maximum  entropy  subject  to  the  constraints  imposed.  When 
the  only  constraint  is  that  the  sum  of  the  Qi  must  sum  to  1,  the  distribution  with  maximum  entropy  is  found  by 
applying  Equation  (18)  with  Ai  =  0: 


Qi 


(&i) 


(19) 


gO  xyli(ii) 

X:"=ie°Xj4l(a<) 


(20) 


gOxAi(zi)  _  j 


(21) 


n 

y^ePxAxjxi) 

2—1 


n 


(22) 


Qi  =  -■  (23) 

n 

This  is  exactly  model  Ma  that  we  assigned  intuitively  in  the  earlier  numerical  examples. 

Let  us  see  how  we  could  arrive  at  model  Mb  using  the  MEP.  We  now  introduce  a  constraint  as  one  piece  of 
information  that  we  wish  to  insert  into  the  assignment.  That  constraint  is  that  P(Fail)  =  .20.  Perhaps  the  easiest 
way  of  writing  this  constraint  is  to  place  a  1  in  the  first  four  cells  and  a  0  in  the  last  four  cells  as  the  values  for 
Ai(xi).  Table  3  shows  the  subsequent  computation  of  the  Qi  using  Equation  (18). 
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Table  3:  The  MEP  assignment  of  the  Qi  for  one  constraint.  This  corresponds  to  model  Mb- 


Cell 

Ai(xi) 

exp(AiAx(xi)) 

Qi 

1 

1 

.25 

.05 

2 

1 

.25 

.05 

3 

1 

.25 

.05 

4 

1 

.25 

.05 

5 

0 

1.00 

.20 

6 

0 

1.00 

.20 

7 

0 

1.00 

.20 

8 

0 

1.00 

.20 

Ai  =  -1.3863 

5.00 

1.00 

Table  4:  The  MEP  assignment  of  the  Qi  for  two  constraints.  This  corresponds  to  model  Me- 


Cell 

Mix'd 

A2  (xi) 

exp(AiJ4i(a;i)  +  \2A2(xi)) 

Qi 

1 

1 

1 

1.75 

.14 

2 

1 

0 

.25 

.02 

3 

1 

0 

.25 

.02 

4 

1 

0 

.25 

.02 

5 

0 

0 

1.00 

.08 

6 

0 

0 

1.00 

.08 

7 

0 

0 

LOO 

.08 

8 

0 

1 

7.00 

.56 

At  =-1.3863 

A2  =  1.94593 

12.50 

1.00 

The  MEP  formalism  can  be  extended  straightforwardly  to  more  than  one  constraint.  An  additional  Lagrange 
multiplier,  A2,  and  constraint  function,  j42(:c*),  are  placed  into  Equation  (18).  This  results  in 

g\iAi  (xi)+A2^2(xi) 

=  X™  f;A,.4,(xi)-i-A2.42(xi)  '  . 

We  can  use  Equation  (24)  to  find  models  with  two  pieces  of  information  inserted,  and  we  can  be  sure  that  only 
these  two  pieces  are  involved.  An  example  of  such  a  model  with  two  constraints  was  model  Me-  In  this  model, 
we  entertained  the  hypothesis  that  a  predictor  variable  was  associated  with  success  in  training  in  addition  to  a 
given  value  for  the  probability  of  failing.  Table  4  shows  the  numerical  computations  needed  to  assign  the  Qi 
values  for  this  model  ensuing  from  Equation  (24). 

All  three  constraints  are  satisfied  by  this  assignment  to  the  Qi.  The  universal  constraint  that  all  Qi  sum  to  1  is 
satisfied.  The  constraint  inserted  by  model  M  b  that  the  probability  of  failing  is  equal  to  .20  is  satisfied.  The 
additional  constraint  inserted  by  model  Me  that  low  scores  on  both  predictor  variables  lead  to  a  higher  probability 
of  failing  and  that  high  scores  on  both  predictor  variables  lead  to  a  higher  probability  of  passing  is  satisfied.  The 
MEP  also  tells  us  the  correct  degrees  of  freedom  for  the  x2  test.  It  is  v  =  n  —  number  of  constraints,  which  is  " 
v  =  5  for  model  Me¬ 
lt  is  important  to  emphasize  that  this  information  and  only  this  information  has  been  inserted  into  the 
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assignment.  This  means  that  the  information  entropy  of  the  assignment  given  to  the  Qt  in  Table  4  is  the  maximum 
possible  entropy  given  the  constraints.  There  are  other  assignments  to  the  Qi  that  satisfy  all  three  constraints,  but 
they  possess  an  entropy  that  is  less  than  the  MEP  assignment. 

SUMMARY 

We  have  shown  that  a  well-known  formula  from  information  theory  can  be  derived  from  a  Bayesian  model 
evaluation  approach  to  contingency  tables.  The  value  computed  by  this  function  of  the  cross-entropy  is  compared 
to  a  x2  distribution  to  judge  whether  a  proposed  model  is  acceptable.  Such  models  refer  to  probabilities  for  a  flight 
candidate  being  placed  in  a  particular  cell  of  a  contingency  table.  Each  cell  represents  an  intersection  of  some 
number  of  predictor  variables  and  a  criterion  variable.  Simple  numerical  examples  illustrating  this  concept  were 
presented  in  this  report.  A  follow-on  report  [2]  will  use  the  techniques  developed  here  to  analyze  isoperformance 
issues.  In  this  practical  application,  PBI  and  API  scores  are  used  as  the  predictor  variables  and  attrition  in  any 
phase  of  flight  training  is  employed  as  the  criterion  variable. 
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Appendix 

Derivation  of  Information  Entropy  Formula  from  Bayesian  Model  Evaluation 


At  the  stage  where  we  left  the  derivation  in  the  earlier  section  of  this  paper,  we  had  to  find  the  prior  probability 
for  any  contingency  table  based  on  the  truth  of  some  given  model.  As  we  mentioned  earlier,  each  model  assigns 
some  definite  value  to  each  of  the  Qt  values,  n  in  number.  Each  Qi,  remember,  assigns  a  probability  for  a  subject 
to  be  categorized  into  the  zth  cell  of  the  contingency  table.  The  prior  probability  for  the  numbers  appearing  in  any 
contingency  table  is  based  on  the  multinomial  formula.  The  prior  probability  for  any  contingency  table  based  on 
model  Ma  is  therefore, 

P(Fj\MA)  =  W(Fj)Qf1Qf*...Qf\  (25) 

In  the  same  manner,  the  prior  probability  for  the  same  contingency  table  based  on  a  different  model,  model  Mb,  is 

P(Fj\Mb)  =  W(Fj)  Q? Nl  Qf2  ■  ■  ■  .  (26) 


The  symbol  W(Fj)  refers  to  the  multiplicity  factor;  the  number  of  ways  that  each  contingency  table  could  be 
formed  without  regard  to  the  order  that  subjects  are  placed  into  the  cells. 

We  can  now  form  the  ratio  of  posterior  probabilities  for  the  two  models  as 


P(Ma\D)  _  P{D\Ma)P{Ma) 
P{Mb\D )  P{D\Mb)P{MbY 


Because  we  are  assuming  that  the  prior  probabilities  of  the  two  models  are  equal,  we  can  write  the  ratio  of 
posterior  probabilities  as  the  ratio  of  likelihoods: 


PjMAD)  =  P[D\Ma ) 
P(Mb\D)  P(D\Mb ) 

P(Fj\  Ma) 
P(Fj\MB) 


wm9±l9iMo£l 


(28) 


(29) 


(30) 


The  multiplicity  factor  cancels  in  this  ratio  so  that 


p(Ma\d)  _ 

P{Mb\D)  QfNl Q%N2  •  •  ■  Q%Nn 


(31) 


At  this  juncture,  we  bring  in  a  classical  theorem  from  non-Bayesian  statistics,  the  asymptotic  property  of  the 
likelihood  ratio  test  [1],  This  theorem  states  that  a  quantity,  -2  In  A,  where  A  is  a  ratio  of  likelihoods  as  in 
Equation  (31),  will  be  distributed  according  to  the  chi-square  (x2)  distribution  as  N  — *•  oo2.  Jaynes’s  similar 
Entropy  Concentration  Theorem  [2]  can  also  be  invoked.  This  kind  of  transformation  carried  out  on  the  posterior 
probabilities  of  the  two  models  will  then  be  distributed  as  a  x2  distribution, 


-2  In 


'P(Ma\D) 

_P(Mb\D) 


~X2  {vdf). 


2 this  use  of  A  is  to  be  distinguished  from  its  use  as  the  Lagrange  multiplier. 


(32) 
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Substituting  the  right  hand  side  of  Equation  (31)  as  the  likelihood  ratio,  the  transformation  yields, 


In  Equation  (35),  we  made  use  of  the  following  identity 


=  In  x  —  In  y 

—(\nx  —  In  y) 

=  In  y  —  In  x 

II 

Ef 

If  we  let  Qf 
exactly  the  same 
for  model  Mb, 


stand  for  the  very  best  model  as  a  benchmark  reference,  then  the  Q,  for  model  M b  will  be 
as  the  observed  frequencies,  Ni/N.  Equation  (36)  now  looks  like  this  after  making  this  choice 


n 

2  Ni  In 

i—1 


NijN 

Qt 


(37) 


Now  we  want  to  get  Equation  (37)  into  a  form  that  expressly  shows  the  frequencies,  /*.  Multiply  and  divide  the 
right-hand  side  of  Equation  (37)  by  N  to  achieve 


ro 

2NxYj^\n 

i= 1 


N 


NJN 

Qt 


=  2Nj2fil* 


i= 1 


Qi 


The  summation  term  in  Equation  (38)  is  well-known  in  information  theoiy  as  cross-entropy.  We  will  give  it  the 
notation  H(f ,  M a)  to  indicate  that  it  is  the  information  cross-entropy  of  the  actual  frequencies  with  some  model 
for  the  Qi ,  here  Model  A, 

=  n-4-  (39) 

i=l  -  ^ 


As  our  final  statement,  then,  we  see  that  any  model  can  be  accepted  or  rejected  on  the  basis  of  its  information 
cross-entropy  and  where  it  falls  in  relation  to  a  %2  distribution 

2NH(f,M)  ~  x2  (v  df).  (40) 

We  will  adopt  the  usual  criterion  for  rejection  of  a  proposed  model  on  the  basis  of  whether  2 NH(f,  M)  falls  into 
the  upper  5%  region  of  the  ^  distribution 
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